Hands-on Exercise 3: Processing and Visualising Flow Data

Published

November 30, 2023

Modified

November 30, 2023

1 Overview

Spatial interaction describe quatitatively the flow of people, material, or information between locations in geographical space.

Conditions for Spatial Flows

Three interdependent conditions are necessary for a spatial interaction to occur:

Features

  • Locations: A movement is occurring between a location of origin and a location of destination (i=origin; j =destination)
  • Centroid: Abstraction of the attributes of a zone at a point
  • Flows: Expressed by a valued vector Tij representing an interaction between locations i and j
  • Vectors: A vector Tij links two centroids and has a value assigned to it (50) which can represents movements

1.1 Task

In this hands-on exercise, we will learn how to build an OD matrix by using Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall. By the end of this hands-on exercise, we will be able:

  • to import and extract OD data for a selected time interval,
  • to import and save geospatial data (i.e. bus stops and mpsz) into sf tibble data frame objects,
  • to populate planning subzone code into bus stops sf tibble data frame,
  • to construct desire lines geospatial data from the OD data, and
  • to visualise passenger volume by origin and destination bus stops by using the desire lines data.

1.2 Loading R Packages

pacman::p_load(tmap, sf, DT, stplanr,
               performance,
               ggpubr, tidyverse)
  • tmap for creating thematic maps; useful for static and interactive maps.
  • sf for importing, integrating, processing and transforming geospatial data.
  • DT for interactive data tables
  • stplanr for sustainable transport planning; provides functions and tools for analysis and visualisation of transport projects
  • performance for model performance measurement
  • ggpubr for visualisation
  • tidyverse for importing, integrating, wrangling and visualising data.

1.3 Preparing Flow Data

Note: Using October 2023 data because Postman API couldn’t find Oct 2022 data, maybe too long ago :(

odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

odbus tibble data frame shows that the values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are in numeric data type. Hence, the code chunk below is used to convert these data values into character data type.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

Recheck to confirm that the 2 variables have indeed been updated:

glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <fct> 04168, 04168, 80119, 80119, 44069, 20281, 20281, 1…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 20141, 20141, 1…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

For our study, we will extract commuting flows on weekday and between 6 and 9 o’clock.

odbus6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

datatable allows for interactive tables:

Show the code
datatable(
  odbus6_9,
  filter='top')

We will save the output in rds format for future use, and reimport the saved rds file into R environment:

write_rds(odbus6_9, "data/rds/odbus6_9.rds")
odbus6_9 <- read_rds("data/rds/odbus6_9.rds")

1.4 Working with Geospatial Data

Point Data

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\kytjy\ISSS624\Hands-on_Ex\Hands-on_Ex3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

Polygon data

mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\kytjy\ISSS624\Hands-on_Ex\Hands-on_Ex3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...

Combine Busstop and mpsz

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
Show the code
datatable(busstop_mpsz)

Save the output in rds format for future use:

write_rds(busstop_mpsz, "data/rds/busstop_mpsz.rds")  

Append planning subzone code from busstop_mpsz onto odbus6_9

od_data <- left_join(odbus6_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C,
         DESTIN_BS = DESTINATION_PT_CODE)

Duplicates Check

Check for duplicates to prevent double counting:

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate
# A tibble: 1,186 × 4
   ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ
   <chr>     <fct>     <dbl> <chr>    
 1 11009     01341         1 QTSZ01   
 2 11009     01341         1 QTSZ01   
 3 11009     01411         4 QTSZ01   
 4 11009     01411         4 QTSZ01   
 5 11009     01421        17 QTSZ01   
 6 11009     01421        17 QTSZ01   
 7 11009     01511        19 QTSZ01   
 8 11009     01511        19 QTSZ01   
 9 11009     01521         2 QTSZ01   
10 11009     01521         2 QTSZ01   
# ℹ 1,176 more rows

If duplicated records are found, the code chunk below will be used to retain the unique records.

od_data <- unique(od_data)

Update od_data with planning subzone codes

od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate
# A tibble: 1,350 × 5
   ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ SUBZONE_C
   <chr>     <chr>     <dbl> <chr>     <chr>    
 1 01013     51071         2 RCSZ10    CCSZ01   
 2 01013     51071         2 RCSZ10    CCSZ01   
 3 01112     51071        66 RCSZ10    CCSZ01   
 4 01112     51071        66 RCSZ10    CCSZ01   
 5 01112     53041         4 RCSZ10    BSSZ01   
 6 01112     53041         4 RCSZ10    BSSZ01   
 7 01121     51071         8 RCSZ04    CCSZ01   
 8 01121     51071         8 RCSZ04    CCSZ01   
 9 01121     82221         1 RCSZ04    GLSZ05   
10 01121     82221         1 RCSZ04    GLSZ05   
# ℹ 1,340 more rows

Retain unique records:

od_data <- unique(od_data)

Aggregate Data

od_data <- od_data %>%
  # Rename column for better clarity
  rename(DESTIN_SZ = SUBZONE_C) %>%
  # Remove NAs
  drop_na() %>% 
  # Group and summarise number of trips at each O/D level 
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))

od_data
# A tibble: 21,079 × 3
# Groups:   ORIGIN_SZ [310]
   ORIGIN_SZ DESTIN_SZ MORNING_PEAK
   <chr>     <chr>            <dbl>
 1 AMSZ01    AMSZ01            2694
 2 AMSZ01    AMSZ02           10591
 3 AMSZ01    AMSZ03           14980
 4 AMSZ01    AMSZ04            3106
 5 AMSZ01    AMSZ05            7734
 6 AMSZ01    AMSZ06            2306
 7 AMSZ01    AMSZ07            1824
 8 AMSZ01    AMSZ08            2734
 9 AMSZ01    AMSZ09            2300
10 AMSZ01    AMSZ10             164
# ℹ 21,069 more rows

Save the output in rds format for future use, and reimport into R environment:

write_rds(od_data, "data/rds/od_data.rds")
od_data <- read_rds("data/rds/od_data.rds")

1.5 Visualising Spatial Interaction

We will not plot the intra-zonal flows, i.e. where the origin and destination are the same (eg origin = AMSZ01 and destination = AMSZ01)

The code chunk below will be used to remove intra-zonal flows.

od_data1 <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]
Note

The comma , after the condition is significant. In R’s data frame syntax, the format for subsetting is [rows, columns]. When you place a condition before the comma, it applies to rows. The comma itself then implies that you’re not applying any specific filter to the columns – meaning you want all columns.

flowLine <- od2line(flow = od_data1, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")
tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)

When the flow data are very messy and highly skewed like the one shown above, it is wiser to focus on selected flows, for example flow greater than or equal to 5000 as shown below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)